33 research outputs found

    What Is Not Where: the Challenge of Integrating Spatial Representations Into Deep Learning Architectures

    Get PDF
    This paper examines to what degree current deep learning architectures for image caption generation capture spatial lan- guage. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the cap- tions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric rela- tions between objects

    Back to the Future: Logic and Machine Learning

    Get PDF
    In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology

    A Model for Attention-Driven Judgements in Type Theory with Records

    Get PDF
    This paper makes three contributions to the discussion on the applicability of Type Theory with Records (TTR) to embodied dialogue agents. First, it highlights the problem of type assignment or judgements in practical implementations which is resource intensive. Second, it presents a judgement control mechanism, which consists of grouping of types into clusters or states by their thematic relations and selection of types following two mechanisms inspired by the Load Theory of selective attention and cognitive control (Lavie et al., 2004), that addresses this problem. Third, it presents a computational framework, based on Bayesian inference, that offers a basis for future practical experimentation on the feasibility of the proposed approach

    What is not where: the challenge of integrating spatial representations into deep learning architectures

    Get PDF
    This paper examines to what degree current deep learning architectures for image caption generation capture spatial language. On the basis of the evaluation of examples of generated captions from the literature we argue that systems capture what objects are in the image data but not where these objects are located: the captions generated by these systems are the output of a language model conditioned on the output of an object detector that cannot capture fine-grained location information. Although language models provide useful knowledge for image captions, we argue that deep learning image captioning architectures should also model geometric relations between objects.Comment: 15 pages, 10 figures, Appears in CLASP Papers in Computational Linguistics Vol 1: Proceedings of the Conference on Logic and Machine Learning in Natural Language (LaML 2017), pp. 41-5

    Modular Mechanistic Networks: On Bridging Mechanistic and Phenomenological Models with Deep Neural Networks in Natural Language Processing

    Get PDF
    Natural language processing (NLP) can be done using either top-down (theory driven) and bottom-up (data driven) approaches, which we call mechanistic and phenomenological respectively. The approaches are frequently considered to stand in opposition to each other. Examining some recent approaches in deep learning we argue that deep neural networks incorporate both perspectives and, furthermore, that leveraging this aspect of deep learning may help in solving complex problems within language technology, such as modelling language and perception in the domain of spatial cognition

    Exploring the Functional and Geometric Bias of Spatial Relations Using Neural Language Models

    Get PDF
    The challenge for computational models of spatial descriptions for situated dialogue systems is the integration of information from different modalities. The semantics of spatial descriptions are grounded in at least two sources of information: (i) a geometric representation of space and (ii) the functional interaction of related objects that. We train several neural language models on descriptions of scenes from a dataset of image captions and examine whether the functional or geometric bias of spatial descriptions reported in the literature is reflected in the estimated perplexity of these models. The results of these experiments have implications for the creation of models of spatial lexical semantics for human-robot dialogue systems. Furthermore, they also provide an insight into the kinds of the semantic knowledge captured by neural language models trained on spatial descriptions, which has implications for image captioning systems

    Local Alignment of Frame of Reference Assignment in English and Swedish Dialogue

    Get PDF
    In this paper we examine how people assign, interpret, negotiate and repair the frame of reference (FoR) in online text-based dialogues discussing spatial scenes in English and Swedish. We describe our corpus and data collection which involves a coordination experiment in which dyadic dialogue participants have to identify differences in their picture of a visual scene. As their perspectives of the scene are different, they must coordinate their FoRs in order to complete the task. Results show that participants do not align on a global FoR, but tend to align locally, for sub-portions (or particular conversational games) in the dialogue. This has implications for how dialogue systems should approach problems of FoR assignment – and what strategies for clarification they should implement
    corecore